Boosting Schema Matchers

نویسندگان

  • Anan Marie
  • Avigdor Gal
چکیده

Schema matching is recognized to be one of the basic operations required by the process of data and schema integration, and thus has a great impact on its outcome. We propose a new approach to combining matchers into ensembles, called Schema Matcher Boosting (SMB). This approach is based on a well-known machine learning technique, called boosting. We present a boosting algorithm for schema matching with a unique ensembler feature, namely the ability to choose the matchers that participate in an ensemble. SMB introduces a new promise for schema matcher designers. Instead of trying to design a perfect schema matcher that is accurate for all schema pairs, a designer can focus on finding better than random schema matchers. We provide a thorough comparative empirical results where we show that SMB outperforms, on average, any individual matcher. In our experiments we have compared SMB with more than 30 other matchers over a real world data of 230 schemata and several ensembling approaches, including the Meta-Learner of LSD. Our empirical analysis shows that SMB improves, on average, over the performance of individual matchers. Moreover, SMB is shown to be consistently dominant, far beyond any other individual matcher. Finally, we observe that SMB performs better than the MetaLearner in terms of precision, recall and F-Measure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

XML Matchers: approaches and challenges

Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespr...

متن کامل

Calibration and comparison of schema matchers

Schemas used in various environments become more and more numerous, though they do not comply to a universal standard. That is why the task of schema matching has emerged and its main objective is to find means to map a schema into another. Several initiations have occurred and algorithms have been proposed to solve the problem. They muster highly enticing solutions, though they have several fl...

متن کامل

Instance Matching with COMA++

Schema matching is the process of identifying semantic correspondences between schemas. COMA++ is a matching prototype which uses several characteristics of schemas to determine similarities between them, for example the names and data types of the schema elements and structural information. In this paper we propose two instance-based matchers for COMA++ to gain a further quality improvement. T...

متن کامل

Schema Matching across Query Interfaces on the Deep Web

Schema matching is a crucial step in data integration. Many approaches to schema matching have been proposed so far. Different types of information about schemas, including structures, linguistic features and data types, etc have been used to match attributes between schemas. Relying on a single aspect of information about schemas for schema matching is not sufficient. Approaches have been prop...

متن کامل

CMC: Combining Multiple Schema-Matching Strategies Based on Credibility Prediction

Schema matching, which tries to find semantic correspondences between schema elements, is a key operation in data engineering. Combining multiple matching strategies is a very promising technique for schema matching. To overcome the limitations of existing combination systems and to achieve better performances, in this paper the CMC system is proposed, which combines multiple matchers based on ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008